NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Progressive and Scalable Hotspot Detection through Local K-Function in Spatial Networks

https://doi.org/10.1145/3766549

Kang, Yunfan; Liu, Yongyi; Juvekar, Pratham; Mahmood, Ahmed; Wang, Shaowen; Magdy, Amr (September 2025, ACM Transactions on Spatial Algorithms and Systems)

The widespread availability of geotagged data combined with modern map services allows for the accurate attachment of data to spatial networks. Applying statistical analysis, such as hotspot detection, over spatial networks is very important for precise quantification and patterns analysis, which empowers effective decision-making in various important applications. Existing hotspot detection algorithms on spatial networks either lack sufficient statistical evidence on detected hotspots, such as clustering, or they provide statistical evidence at a prohibitive computational overhead. In this paper, we propose efficient algorithms for detecting hotspots based on the network local K-function for predefined and unknown hotspot radii. The K-function is a widely adopted statistical approach for network pattern analysis that enables the understanding of the density and distribution of activities and events happening within the spatial network. However, its practical application has been limited due to the inefficiency of state-of-the-art algorithms, particularly for large-sized networks. Extensive experimental evaluation using real and synthetic datasets shows that our algorithms are up to 28 times faster than the state-of-the-art algorithms in computing hotspots with a predefined radius and up to more than four orders of magnitude faster in identifying hotspots without a predefined radius. Additionally, to address dynamic changes in the spatial network, we propose an incremental hotspot detection approach that efficiently updates hotspot computations by leveraging prior results as new events are added.
more » « less
Free, publicly-accessible full text available September 11, 2026
On scalable DCEL overlay operations

https://doi.org/10.1007/s10707-025-00539-x

Calderon-Romero, Andres; Abdelhafeez, Laila; Trajcevski, Goce; Magdy, Amr; Tsotras, Vassilis J (July 2025, GeoInformatica)

Abstract The Doubly Connected Edge List (DCEL) is an edge-list structure widely used in spatial applications, primarily for planar topological and geometric computations. However, it is also applicable to various types of data, including 3D models and geographic data. An essential operation is theoverlay operation, which combines the DCELs of two input polygon layers and can easily support spatial queries on polygons like the intersection, union, and difference between these layers. However, existing techniques for spatial overlay operations suffer from two main limitations. First, they fail to handle many large datasets practically used in real applications. Second, they cannot handle arbitrary spatial lines that practically form polygons, e.g., city blocks, but they are given as a set of scattered lines. This work proposes a distributed and scalable way to compute the overlay operation and its related supported queries. Our operations also support arbitrary spatial lines through a scalable polygonization process. We address the issues of efficiently distributing the lines and overlay operators and offer various optimizations that improve performance. Our experiments demonstrate that the proposed scalable solution can efficiently compute the overlay of large real datasets.
more » « less
Free, publicly-accessible full text available July 1, 2026
Pyneapple-G: Scalable Spatial Grouping Queries

Abdelhafeez, Laila; Calderon, Andres; Magdy, Amr; Tsotras, Vassilis J (September 2024, Proceedings of the VLDB Endowment)

This paper demonstrates Pynapple-G, an open-source library for scalable spatial grouping queries based on Apache Sedona (formerly known as GeoSpark). We demonstrate two modules, namely, SGPAC and DDCEL, that support grouping points, grouping lines, and polygon overlays. The SGPAC module provides a large-scale grouping of spatial points by highly complex polygon boundaries. The grouping results aggregate the number of spatial points within the boundaries of each polygon. The DDCEL module provides the first parallelized algorithm to group spatial lines into a DCEL data structure and discovers planar polygons from scattered line segments. Exploiting the scalable DCEL, we support scalable overlay operations over multiple polygon layers to compute the layers’ intersection, union, or difference. To showcase Pyneapple-G, we have developed a frontend web application that enables users to interact with these modules, select their data layers or data points, and view results on an interactive map. We also provide interactive notebooks demonstrating the superiority and simplicity of Pyneapple-G to help social scientists and developers explore its full potential.
more » « less
Full Text Available
Preparing for a Career at the Intersection of Geography and Computing: Availability and Access to Training Along Geocomputational Career Pathways

https://doi.org/10.1080/00330124.2024.2404911

Nara, Atsushi; Embury, Jessica; Velasco, Matthew; Russell, Rachel; Magdy, Amr; Dony, Coline C (October 2024, The Professional Geographer)

Full Text Available
Pyneapple-G: Scalable Spatial Grouping Queries

https://doi.org/10.14778/3685800.3685902

Abdelhafeez, Laila; Calderon-Romero, Andres; Magdy, Amr; Tsotras, Vassilis J (August 2024, Proceedings of the VLDB Endowment)

This paper demonstratesPynapple-G, an open-source library for scalable spatial grouping queries based on Apache Sedona (formerly known as GeoSpark). We demonstrate two modules, namely,SGPACandDDCEL, that support grouping points, grouping lines, and polygon overlays. TheSGPACmodule provides a large-scale grouping of spatial points by highly complex polygon boundaries. The grouping results aggregate the number of spatial points within the boundaries of each polygon. TheDDCELmodule provides the first parallelized algorithm to group spatial lines into a DCEL data structure and discovers planar polygons from scattered line segments. Exploiting the scalable DCEL, we support scalable overlay operations over multiple polygon layers to compute the layers' intersection, union, or difference. To showcasePyneapple-G, we have developed a frontend web application that enables users to interact with these modules, select their data layers or data points, and view results on an interactive map. We also provide interactive notebooks demonstrating the superiority and simplicity ofPyneapple-Gto help social scientists and developers explore its full potential.
more » « less
Full Text Available
Pyneapple-L: Scalable Expressive Learning-based Spatial Analysis

https://doi.org/10.1145/3678717.3691228

Liu, Yongyi; Lee, Nicolas; Kang, Yunfan; Shahneh, Mohammad Reza; Mahmood, Ahmed; Chinnam, Vishal Rohith; Sarawadekar, Aparna Vivek; Oymak, Samet; Sabek, Ibrahim; Magdy, Amr (October 2024, ACM)

Full Text Available
Scalable Spatio-temporal Top-k Interaction Queries on Dynamic Communities

https://doi.org/10.1145/3648374

Almaslukh, Abdulaziz; Liu, Yongyi; Magdy, Amr (March 2024, ACM Transactions on Spatial Algorithms and Systems)

Social media platforms generate massive amounts of data that reveal valuable insights about users and communities at large. Existing techniques have not fully exploited such data to help practitioners perform a deep analysis of large online communities. Lack of scalability hinders analyzing communities of large sizes and requires tremendous system resources and unacceptable runtime. This article proposes a new analytical query that identifies the top-kposts that a given user community has interacted with during a specific time interval and within a spatial range. We propose a novel indexing framework that captures the interactions of users and communities to provide a low query latency. Moreover, we propose exact and approximate algorithms to process the query efficiently and utilize the index content to prune the search space. The extensive experimental evaluation on real data has shown the superiority of our techniques and their scalability to support large online communities.
more » « less
Full Text Available
Pyneapple-R: Scalable and Expressive Spatial Regionalization

https://doi.org/10.1109/ICDE60146.2024.00437

Kang, Yunfan; Liu, Yongyi; Alrashid, Hussah; Bilgi, Akash; Purohit, Siddhant; Mahmood, Ahmed; Rey, Sergio; Magdy, Amr (May 2024, IEEE)

Full Text Available
On the Ecosystem of High-Definition (HD) Maps

https://doi.org/10.1109/ICDEW61823.2024.00010

Zhuy, Yuanjie; Alrashid, Hussah; Bai, Song; Zhang, Chunhan; Zhang, Ziliang; Qu, Zhengyi; Ali, Reem Y; Magdy, Amr (May 2024, IEEE)

Full Text Available
Statistical Inference for Spatial Regionalization

https://doi.org/10.1145/3589132.3625608

Alrashid, Hussah; Magdy, Amr; Rey, Sergio (November 2023, The 31st ACM International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL '23))

The process of regionalization involves clustering a set of spatial areas into spatially contiguous regions. Given the NP-hard nature of regionalization problems, all existing algorithms yield approximate solutions. To ascertain the quality of these approximations, it is crucial for domain experts to obtain statistically significant evidence on optimizing the objective function, in comparison to a random reference distribution derived from all potential sample solutions. In this paper, we propose a novel spatial regionalization problem, denoted as SISR (Statistical Inference for Spatial Regionalization), which generates random sample solutions with a predetermined region cardinality. The driving motivation behind SISR is to conduct statistical inference on any given regionalization scheme. To address SISR, we present a parallel technique named PRRP (P-Regionalization through Recursive Partitioning). PRRP operates over three phases: the region-growing phase constructs initial regions with a predetermined region cardinality, while the region merging and region-splitting phases ensure the spatial contiguity of unassigned areas, allowing for the growth of subsequent regions with predetermined cardinalities. An extensive evaluation shows the effectiveness of PRRP using various real datasets.
more » « less
Full Text Available

« Prev Next »

Search for: All records